1,636 research outputs found

    TopCom: Index for Shortest Distance Query in Directed Graph

    Get PDF
    Finding shortest distance between two vertices in a graph is an important problem due to its numerous applications in diverse domains, including geo-spatial databases, social network analysis, and information retrieval. Classical algorithms (such as, Dijkstra) solve this problem in polynomial time, but these algorithms cannot provide real-time response for a large number of bursty queries on a large graph. So, indexing based solutions that pre-process the graph for efficiently answering (exactly or approximately) a large number of distance queries in real-time is becoming increasingly popular. Existing solutions have varying performance in terms of index size, index building time, query time, and accuracy. In this work, we propose T OP C OM , a novel indexing-based solution for exactly answering distance queries. Our experiments with two of the existing state-of-the-art methods (IS-Label and TreeMap) show the superiority of T OP C OM over these two methods considering scalability and query time. Besides, indexing of T OP C OM exploits the DAG (directed acyclic graph) structure in the graph, which makes it significantly faster than the existing methods if the SCCs (strongly connected component) of the input graph are relatively small

    FS^3: A Sampling based method for top-k Frequent Subgraph Mining

    Get PDF
    Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS^3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS^3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS^3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS^3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size

    Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

    Get PDF
    We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an online algorithm to train the model components jointly. We evaluate the models in a setup, where contextual information is available. The experimental results on tasks involving classification, clustering, and ranking of sentences show that our model outperforms the best existing models by a wide margin across multiple datasets

    Name Disambiguation from link data in a collaboration graph using temporal and topological features

    Get PDF
    In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

    Incremental eigenpair computation for graph Laplacian matrices: theory and applications

    Get PDF
    The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used for spectral clustering and community detection. However, in real-life applications, the number of clusters or communities (say, K) is generally unknown a priori. Consequently, the majority of the existing methods either choose K heuristically or they repeat the clustering method with different choices of K and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the Kth smallest eigenpair of the Laplacian matrix given a collection of all previously compute

    Honey bee foraging: persistence to non-rewarding feeding locations and waggle dance communication

    Get PDF
    The honey bee, Apis mellifera, is important in agriculture and also as a model species in scientific research. This Master’s thesis is focused on honey bee foraging behaviour. It contains two independent experiments, each on a different subject within the area of foraging. Both use a behavioural ecology approach, with one investigating foraging behaviour and the other foraging communication. These form chapters 2 and 3 of the thesis, after an introductory chapter. Chapter 2. Experiment 1: Persistence to unrewarding feeding locations by forager honey bees (Apis mellifera): the effects of experience, resource profitability, and season This study shows that the persistence of honey bee foragers to unrewarding food sources, measured both in duration and number of visits, was greater to locations that previously offered sucrose solution of higher concentration (2 versus 1molar) or were closer to the hive (20 versus 450m). Persistence was also greater in bees which had longer access at the feeder before the syrup was terminated (2 versus 0.5h). These results indicate that persistence is greater for more rewarding locations. However, persistence was not higher in the season of lowest nectar availability in the environment. Chapter 3. Experiment 2: Honey bee waggle dance communication: signal meaning and signal noise affect dance follower behaviour This study shows that honey bee foragers follow fewer waggle runs as the distance to the food source, that is advertised by the dance, increases, but invest more time in following these dances. This is because waggle run duration increases with increasing foraging distance. The number of waggle runs followed for distant food sources was further reduced by increased angular noise among waggle runs within a dance. The number of dance followers per dancing bee was affected by the time of year and varied among colonies. Both noise in the message, that is variation in the direction component, and the message itself, that is the distance of the advertised food location, affect dance following. These results indicate that dance followers pay attention to the costs and benefits associated with using dance information

    The Effectiveness of Educational Games on Scientific Concepts Acquisition in First Grade Students in Science

    Get PDF
    This study aimed at investigating the effectiveness of educational games on scientific concepts acquisition by the first grade students. The sample of the study consisted of (53) male and female students distributed into two groups: experimental group (n=26) which taught by educational games, and control group (n=27) which taught by traditional method. To achieve the purpose of the study, the researcher developed a teaching guide included eight educational games, and a test to measure scientific concepts acquisition. Results showed that there were statistically significant differences in students’ scientific concepts acquisition due to the method of teaching in favor of the experimental group. Also, there were no statistically significant differences in students’ scientific concepts acquisition due to the gender or the interaction between method of teaching and gender. The study recommended using educational games in teaching science in primary education. Keywords: Educational Games, Scientific Concepts, Science

    The degree of career polarization among educational leaders in the Jordanian Education Directorates

    Get PDF
    The study aimed to identify the degree of career polarization among educational leaders in the Jordanian–education directorates of Ajloun and Jersah. The researchers adopted the descriptive -analytical approach for its suitability for such studies. The researchers used the questionnaire as the study instrument, which comprised 20 items. The researchers distributed the items in two domains, 10 items were for each domain as the study instrument. The sample of the study comprised 250 educational leaders for the first semester of the academic year 2019-2020. The study results showed that the degree of career polarization among educational leaders in the Jordanian Ministry of Education came with an average degree of rating in all its domains and for all items. The results also showed that there were no statistically significant differences at the level of statistical significance (α=0.05) attributed to the two study variables. Gender and the number of years of experience are the two study variables

    Name Disambiguation in Anonymized Graphs using Network Embedding

    Get PDF
    In real-world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesake of one another. Such mistakes deteriorate the performance of document retrieval, web search, and more seriously, cause improper attribution of credit or blame in digital forensic. To resolve this issue, the name disambiguation task is designed which aims to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique real-life person. Existing solutions to this task substantially rely on feature engineering, such as biographical feature extraction, or construction of auxiliary features from Wikipedia. However, for many scenarios, such features may be costly to obtain or unavailable due to the risk of privacy violation. In this work, we propose a novel name disambiguation method. Our proposed method is non-intrusive of privacy because instead of using attributes pertaining to a real-life person, our method leverages only relational data in the form of anonymized graphs. In the methodological aspect, the proposed method uses a novel representation learning model to embed each document in a low dimensional vector space where name disambiguation can be solved by a hierarchical agglomerative clustering algorithm. Our experimental results demonstrate that the proposed method is significantly better than the existing name disambiguation methods working in a similar setting
    corecore